fix: arithmetic overflow in mDNS TTL implementation #5967

momoshell · 2025-03-31T12:22:56Z

Description

This PR tries to fix possible arithmetic overflows in the mDNS implementation when processing response packets with tremendous TTL values.

Fixes #5943

Notes & open questions

I’ve kept the source in its original form and added a new filter combinator. This feels like a small, minimal change and, in my opinion, the cleanest solution.

Change checklist

I have performed a self-review of my own code
I have made corresponding changes to the documentation
I have added tests that prove my fix is effective or that my feature works
A changelog entry has been made in the appropriate crates

elenaf9

Two nits, rest LGTM. Thanks!

protocols/mdns/src/behaviour/iface/query.rs

elenaf9 · 2025-04-03T02:33:51Z

protocols/mdns/src/behaviour/iface/query.rs

            .filter(move |peer| peer.id() != &local_peer_id)
+            .filter(move |peer| now.checked_add(peer.ttl()).is_some())


Sorry, just saw that my above suggestion doesn't make any sense. I meant:

Suggested change

.filter(move |peer| peer.id() != &local_peer_id)

.filter(move |peer| now.checked_add(peer.ttl()).is_some())

.filter(move |peer| peer.id() != &local_peer_id && now.checked_add(peer.ttl()).is_some())

Still, I am not a big fan of calculation the now + peer.ttl() sum twice. Why not use ffilter_map and return the now.checked_add(peer.ttl()) Option so it can be used directly in the next step?

momoshell · 2025-04-04T11:23:29Z

Hey hey, thank you for your responses.
After reviewing the proposed changes, I realized something. It might be better not to filter out peers with large TTLs. A more effective solution could be to cap the new_expiration value to a reasonable limit, such as one day or smth.

// Use a reasonable maximum - 1 day should be sufficient for most scenarios
const MAX_TTL: Duration = Duration::from_secs(60 * 60 * 24);  

pub(crate) fn extract_discovered(
        &self,
        now: Instant,
        local_peer_id: PeerId,
    ) -> impl Iterator<Item = (PeerId, Multiaddr, Instant)> + '_ {
        self.discovered_peers()
            .filter(move |peer| peer.id() != &local_peer_id)
            // Remove this filter to keep all peers
            // .filter(move |peer| now.checked_add(peer.ttl()).is_some())
            .flat_map(move |peer| {
                let observed = self.observed_address();
                // Cap expiration time to avoid overflow
                let new_expiration = match now.checked_add(peer.ttl()) {
                    Some(expiration) => expiration,
                    None => {
                        now.checked_add(MAX_TTL)
                            .unwrap_or(now) // Fallback to now if even that overflows (extremely unlikely)
                    }
                };
                
                peer.addresses().iter().filter_map(move |address| {
                    let new_addr = _address_translation(address, &observed)?;
                    let new_addr = new_addr.with_p2p(*peer.id()).ok()?;
                    Some((*peer.id(), new_addr, new_expiration))
                })
            })
    }

momoshell · 2025-04-04T11:27:38Z

On the other hand, we're still not solving the essence of this problem.
And it seems to me that there might be issues while decoding the MdnsPacket::new_from_bytes.
But I guess this would require much larger efforts on the whole subject.

elenaf9 · 2025-04-05T00:22:02Z

After reviewing the proposed changes, I realized something. It might be better not to filter out peers with large TTLs.

Can you expand on that? Why would a non-malicious peer set a TTL that is large enough to cause an overflow?

And it seems to me that there might be issues while decoding the MdnsPacket::new_from_bytes.

What issue?

momoshell · 2025-04-06T10:10:21Z

Can you expand on that? Why would a non-malicious peer set a TTL that is large enough to cause an overflow?

My initial reaction is that the mDNS protocol itself shouldn't be responsible for handling security. If we're implementing changes that address security concerns, we should consider handling them at a higher layer, outside the mDNS protocol itself.

Completely failing the response would be overly harsh - we'd lose all the valid data in the response because of one overly enthusiastic TTL.
And if we filter out all nodes with problematic TTLs, it could potentially make debugging more difficult in the future. That’s why I believe capping the TTL is a more reasonable short-term fix, especially while we're still investigating the root cause of this issue.

What issue?

I think that we can't be sure whether the problematic TTLs are caused by malicious actors or if the responses were simply malformed. It seems important to investigate the creation of MdnsResponse to ensure that we're capturing the right information.

Finally, I would argue that it might be best to address this at the decoding stage rather than later in the process. This way, we can catch any malformed data earlier on and handle it more gracefully.

elenaf9 · 2025-04-21T11:18:29Z

I think that we can't be sure whether the problematic TTLs are caused by malicious actors or if the responses were simply malformed. It seems important to investigate the creation of MdnsResponse to ensure that we're capturing the right information.

Finally, I would argue that it might be best to address this at the decoding stage rather than later in the process. This way, we can catch any malformed data earlier on and handle it more gracefully.

If we packet is malformed we should discard the whole packet anyway. I don't think there is a way to recover from it. How are we supposed to know what other fields are correct?

momoshell · 2025-04-22T08:34:39Z

This is turning into more of a design decision than we expected.
We’re seeing two possible paths here:

Discard everything due to an overly aggressive TTL.
Cap the TTL, retain the data, and rely on the existing protocol implementations to handle the security aspects.

We’re still not entirely sure why TTL values are coming in this way, and it feels a bit “nuclear” to filter everything out, because it seems that there is some hidden problem somewhere.
I feel that if we filter out everything, debugging this issue becomes a pretty hard thing to do, or somewhat impossible.

elenaf9 · 2025-05-05T09:35:02Z

I think that we can't be sure whether the problematic TTLs are caused by malicious actors or if the responses were simply malformed. It seems important to investigate the creation of MdnsResponse to ensure that we're capturing the right information.

Finally, I would argue that it might be best to address this at the decoding stage rather than later in the process. This way, we can catch any malformed data earlier on and handle it more gracefully.

We are using hickory_proto::Message::from_vec to decode a received packages, so this would be something to discuss at the upstream hickory-dns repo.

We’re still not entirely sure why TTL values are coming in this way, and it feels a bit “nuclear” to filter everything out, because it seems that there is some hidden problem somewhere.
I feel that if we filter out everything, debugging this issue becomes a pretty hard thing to do, or somewhat impossible.

We could add a debug message when discarding a packet and print the discarded package, wouldn't that solve this problem?
I don't think just capping the TTL makes debugging any easier, because then you wouldn't even discover that there was an invalid TTL (and thus potentially malformed packet) in the first place.

momoshell · 2025-05-05T11:25:43Z

We are using hickory_proto::Message::from_vec to decode a received packages, so this would be something to discuss at the upstream hickory-dns repo.

Exactly, and this package is still in its alpha stage. That's why I was thinking about the ease of debugging.

We could add a debug message when discarding a packet and print the discarded package, wouldn't that solve this problem?
I don't think just capping the TTL makes debugging any easier, because then you wouldn't even discover that there was an invalid TTL (and thus potentially malformed packet) in the first place.

Your call on this one — I’m happy to jump in and help whichever way you go!
If I had to pick, I’d lean toward capping it and tossing in a debug message when it hits that point.

That said, your proposed solution works for me too — I just don’t have the same inside scoop on how you all usually like to handle these things.

Fixed arithmetic overflow and added unit test

be89f49

momoshell changed the title ~~Fixed arithmetic overflow and added unit test~~ fix: arithmetic overflow in mDNS TTL implementation Mar 31, 2025

Updated CHANGELOG.md

20e5b18

elenaf9 reviewed Apr 3, 2025

View reviewed changes

Not filtering huge ttls, just capping them

5a65b11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: arithmetic overflow in mDNS TTL implementation #5967

fix: arithmetic overflow in mDNS TTL implementation #5967

Uh oh!

momoshell commented Mar 31, 2025

Uh oh!

elenaf9 left a comment

Uh oh!

Uh oh!

elenaf9 Apr 3, 2025

Uh oh!

elenaf9 Apr 5, 2025 •

edited

Loading

Uh oh!

momoshell commented Apr 4, 2025 •

edited

Loading

Uh oh!

momoshell commented Apr 4, 2025

Uh oh!

elenaf9 commented Apr 5, 2025

Uh oh!

momoshell commented Apr 6, 2025 •

edited

Loading

Uh oh!

elenaf9 commented Apr 21, 2025

Uh oh!

momoshell commented Apr 22, 2025 •

edited

Loading

Uh oh!

elenaf9 commented May 5, 2025

Uh oh!

momoshell commented May 5, 2025

Uh oh!

Uh oh!

		.filter(move \|peer\| peer.id() != &local_peer_id)
		.filter(move \|peer\| now.checked_add(peer.ttl()).is_some())

fix: arithmetic overflow in mDNS TTL implementation #5967

Are you sure you want to change the base?

fix: arithmetic overflow in mDNS TTL implementation #5967

Uh oh!

Conversation

momoshell commented Mar 31, 2025

Description

Notes & open questions

Change checklist

Uh oh!

elenaf9 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

elenaf9 Apr 3, 2025

Choose a reason for hiding this comment

Uh oh!

elenaf9 Apr 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

momoshell commented Apr 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

momoshell commented Apr 4, 2025

Uh oh!

elenaf9 commented Apr 5, 2025

Uh oh!

momoshell commented Apr 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elenaf9 commented Apr 21, 2025

Uh oh!

momoshell commented Apr 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elenaf9 commented May 5, 2025

Uh oh!

momoshell commented May 5, 2025

Uh oh!

Uh oh!

elenaf9 Apr 5, 2025 •

edited

Loading

momoshell commented Apr 4, 2025 •

edited

Loading

momoshell commented Apr 6, 2025 •

edited

Loading

momoshell commented Apr 22, 2025 •

edited

Loading